{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building an RNN in PyTorch\n",
    "\n",
    "In this notebook, I'll construct a character-level RNN with PyTorch. If you are unfamiliar with character-level RNNs, check out [this great article](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy. The network will train character by character on some text, then generate new text character by character. As an example, I will train on Anna Karenina, one of my favorite novels. I call this project Anna KaRNNa."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import torch\n",
    "from torch import nn\n",
    "import torch.nn.functional as F\n",
    "from torch.autograd import Variable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with open('anna.txt', 'r') as f:\n",
    "    text = f.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have the text, encode it as integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "chars = tuple(set(text))\n",
    "int2char = dict(enumerate(chars))\n",
    "char2int = {ch: ii for ii, ch in int2char.items()}\n",
    "encoded = np.array([char2int[ch] for ch in text])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Processing the data\n",
    "\n",
    "We're one-hot encoding the data, so I'll make a function to do that.\n",
    "\n",
    "I'll also create mini-batches for training. We'll take the encoded characters and split them into multiple sequences, given by `n_seqs` (also refered to as \"batch size\" in other places). Each of those sequences will be `n_steps` long."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def one_hot_encode(arr, n_labels):\n",
    "    \n",
    "    # Initialize the the encoded array\n",
    "    one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.float32)\n",
    "    \n",
    "    # Fill the appropriate elements with ones\n",
    "    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.\n",
    "    \n",
    "    # Finally reshape it to get back to the original array\n",
    "    one_hot = one_hot.reshape((*arr.shape, n_labels))\n",
    "    \n",
    "    return one_hot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def get_batches(arr, n_seqs, n_steps):\n",
    "    '''Create a generator that returns mini-batches of size\n",
    "       n_seqs x n_steps from arr.\n",
    "    '''\n",
    "    \n",
    "    batch_size = n_seqs * n_steps\n",
    "    n_batches = len(arr)//batch_size\n",
    "    \n",
    "    # Keep only enough characters to make full batches\n",
    "    arr = arr[:n_batches * batch_size]\n",
    "    # Reshape into n_seqs rows\n",
    "    arr = arr.reshape((n_seqs, -1))\n",
    "    \n",
    "    for n in range(0, arr.shape[1], n_steps):\n",
    "        # The features\n",
    "        x = arr[:, n:n+n_steps]\n",
    "        # The targets, shifted by one\n",
    "        y = np.zeros_like(x)\n",
    "        try:\n",
    "            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n+n_steps]\n",
    "        except IndexError:\n",
    "            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]\n",
    "        yield x, y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining the network with PyTorch\n",
    "\n",
    "Here I'll use PyTorch to define the architecture of the network. We start by defining the layers and operations we want. Then, define a method for the forward pass. I'm also going to write a method for predicting characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class CharRNN(nn.Module):\n",
    "    def __init__(self, tokens, n_steps=100, n_hidden=256, n_layers=2,\n",
    "                               drop_prob=0.5, lr=0.001):\n",
    "        super().__init__()\n",
    "        self.drop_prob = drop_prob\n",
    "        self.n_layers = n_layers\n",
    "        self.n_hidden = n_hidden\n",
    "        self.lr = lr\n",
    "        \n",
    "        self.chars = tokens\n",
    "        self.int2char = dict(enumerate(self.chars))\n",
    "        self.char2int = {ch: ii for ii, ch in self.int2char.items()}\n",
    "        \n",
    "        self.dropout = nn.Dropout(drop_prob)\n",
    "        self.lstm = nn.LSTM(len(self.chars), n_hidden, n_layers, \n",
    "                            dropout=drop_prob, batch_first=True)\n",
    "        self.fc = nn.Linear(n_hidden, len(self.chars))\n",
    "        \n",
    "        self.init_weights()\n",
    "        \n",
    "    def forward(self, x, hc):\n",
    "        ''' Forward pass through the network '''\n",
    "        \n",
    "        x, (h, c) = self.lstm(x, hc)\n",
    "        x = self.dropout(x)\n",
    "        \n",
    "        # Stack up LSTM outputs\n",
    "        x = x.view(x.size()[0]*x.size()[1], self.n_hidden)\n",
    "        \n",
    "        x = self.fc(x)\n",
    "        \n",
    "        return x, (h, c)\n",
    "    \n",
    "    def predict(self, char, h=None, cuda=False, top_k=None):\n",
    "        ''' Given a character, predict the next character.\n",
    "        \n",
    "            Returns the predicted character and the hidden state.\n",
    "        '''\n",
    "        if cuda:\n",
    "            self.cuda()\n",
    "        else:\n",
    "            self.cpu()\n",
    "        \n",
    "        if h is None:\n",
    "            h = self.init_hidden(1)\n",
    "        \n",
    "        x = np.array([[self.char2int[char]]])\n",
    "        x = one_hot_encode(x, len(self.chars))\n",
    "        inputs = Variable(torch.from_numpy(x), volatile=True)\n",
    "        if cuda:\n",
    "            inputs = inputs.cuda()\n",
    "        \n",
    "        h = tuple([Variable(each.data, volatile=True) for each in h])\n",
    "        out, h = self.forward(inputs, h)\n",
    "\n",
    "        p = F.softmax(out).data\n",
    "        if cuda:\n",
    "            p = p.cpu()\n",
    "        \n",
    "        if top_k is None:\n",
    "            top_ch = np.arange(len(self.chars))\n",
    "        else:\n",
    "            p, top_ch = p.topk(top_k)\n",
    "            top_ch = top_ch.numpy().squeeze()\n",
    "        \n",
    "        p = p.numpy().squeeze()\n",
    "        char = np.random.choice(top_ch, p=p/p.sum())\n",
    "            \n",
    "        return self.int2char[char], h\n",
    "    \n",
    "    def init_weights(self):\n",
    "        ''' Initialize weights for fully connected layer '''\n",
    "        initrange = 0.1\n",
    "        \n",
    "        # Set bias tensor to all zeros\n",
    "        self.fc.bias.data.fill_(0)\n",
    "        # FC weights as random uniform\n",
    "        self.fc.weight.data.uniform_(-1, 1)\n",
    "        \n",
    "    def init_hidden(self, n_seqs):\n",
    "        ''' Initializes hidden state '''\n",
    "        # Create two new tensors with sizes n_layers x n_seqs x n_hidden,\n",
    "        # initialized to zero, for hidden state and cell state of LSTM\n",
    "        weight = next(self.parameters()).data\n",
    "        return (Variable(weight.new(self.n_layers, n_seqs, self.n_hidden).zero_()),\n",
    "                Variable(weight.new(self.n_layers, n_seqs, self.n_hidden).zero_()))\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def train(net, data, epochs=10, n_seqs=10, n_steps=50, lr=0.001, clip=5, val_frac=0.1, cuda=False, print_every=10):\n",
    "    ''' Traing a network \n",
    "    \n",
    "        Arguments\n",
    "        ---------\n",
    "        \n",
    "        net: CharRNN network\n",
    "        data: text data to train the network\n",
    "        epochs: Number of epochs to train\n",
    "        n_seqs: Number of mini-sequences per mini-batch, aka batch size\n",
    "        n_steps: Number of character steps per mini-batch\n",
    "        lr: learning rate\n",
    "        clip: gradient clipping\n",
    "        val_frac: Fraction of data to hold out for validation\n",
    "        cuda: Train with CUDA on a GPU\n",
    "        print_every: Number of steps for printing training and validation loss\n",
    "    \n",
    "    '''\n",
    "    \n",
    "    net.train()\n",
    "    opt = torch.optim.Adam(net.parameters(), lr=lr)\n",
    "    criterion = nn.CrossEntropyLoss()\n",
    "    \n",
    "    # create training and validation data\n",
    "    val_idx = int(len(data)*(1-val_frac))\n",
    "    data, val_data = data[:val_idx], data[val_idx:]\n",
    "    \n",
    "    if cuda:\n",
    "        net.cuda()\n",
    "    \n",
    "    counter = 0\n",
    "    n_chars = len(net.chars)\n",
    "    for e in range(epochs):\n",
    "        h = net.init_hidden(n_seqs)\n",
    "        for x, y in get_batches(data, n_seqs, n_steps):\n",
    "            counter += 1\n",
    "            \n",
    "            # One-hot encode our data and make them Torch tensors\n",
    "            x = one_hot_encode(x, n_chars)\n",
    "            x, y = torch.from_numpy(x), torch.from_numpy(y)\n",
    "            \n",
    "            inputs, targets = Variable(x), Variable(y)\n",
    "            if cuda:\n",
    "                inputs, targets = inputs.cuda(), targets.cuda()\n",
    "\n",
    "            # Creating new variables for the hidden state, otherwise\n",
    "            # we'd backprop through the entire training history\n",
    "            h = tuple([Variable(each.data) for each in h])\n",
    "\n",
    "            net.zero_grad()\n",
    "            \n",
    "            output, h = net.forward(inputs, h)\n",
    "            loss = criterion(output, targets.view(n_seqs*n_steps))\n",
    "\n",
    "            loss.backward()\n",
    "            \n",
    "            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.\n",
    "            nn.utils.clip_grad_norm(net.parameters(), clip)\n",
    "\n",
    "            opt.step()\n",
    "            \n",
    "            if counter % print_every == 0:\n",
    "                \n",
    "                # Get validation loss\n",
    "                val_h = net.init_hidden(n_seqs)\n",
    "                val_losses = []\n",
    "                for x, y in get_batches(val_data, n_seqs, n_steps):\n",
    "                    # One-hot encode our data and make them Torch tensors\n",
    "                    x = one_hot_encode(x, n_chars)\n",
    "                    x, y = torch.from_numpy(x), torch.from_numpy(y)\n",
    "                    \n",
    "                    # Creating new variables for the hidden state, otherwise\n",
    "                    # we'd backprop through the entire training history\n",
    "                    val_h = tuple([Variable(each.data, volatile=True) for each in val_h])\n",
    "                    \n",
    "                    inputs, targets = Variable(x, volatile=True), Variable(y, volatile=True)\n",
    "                    if cuda:\n",
    "                        inputs, targets = inputs.cuda(), targets.cuda()\n",
    "\n",
    "                    output, val_h = net.forward(inputs, val_h)\n",
    "                    val_loss = criterion(output, targets.view(n_seqs*n_steps))\n",
    "                \n",
    "                    val_losses.append(val_loss.data[0])\n",
    "                \n",
    "                print(\"Epoch: {}/{}...\".format(e+1, epochs),\n",
    "                      \"Step: {}...\".format(counter),\n",
    "                      \"Loss: {:.4f}...\".format(loss.data[0]),\n",
    "                      \"Val Loss: {:.4f}\".format(np.mean(val_losses)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Time to train\n",
    "\n",
    "Now we can actually train the network. First we'll create the network itself, with some given hyperparameters. Then, define the mini-batches sizes (number of sequences and number of steps), and start the training. With the train function, we can set the number of epochs, the learning rate, and other parameters. Also, we can run the training on a GPU by setting `cuda=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "if 'net' in locals():\n",
    "    del net"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net = CharRNN(chars, n_hidden=512, n_layers=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "n_seqs, n_steps = 128, 100\n",
    "train(net, encoded, epochs=25, n_seqs=n_seqs, n_steps=n_steps, lr=0.001, cuda=True, print_every=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting the best model\n",
    "\n",
    "To set your hyperparameters to get the best performance, you'll want to watch the training and validation losses. If your training loss is much lower than the validation loss, you're overfitting. Increase regularization (more dropout) or use a smaller network. If the training and validation losses are close, you're underfitting so you can increase the size of the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training, we'll save the model so we can load it again later if we need too. Here I'm saving the parameters needed to create the same architecture, the hidden layer hyperparameters and the text characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "checkpoint = {'n_hidden': net.n_hidden,\n",
    "              'n_layers': net.n_layers,\n",
    "              'state_dict': net.state_dict(),\n",
    "              'tokens': net.chars}\n",
    "with open('rnn.net', 'wb') as f:\n",
    "    torch.save(checkpoint, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sampling\n",
    "\n",
    "Now that the model is trained, we'll want to sample from it. To sample, we pass in a character and have the network predict the next character. Then we take that character, pass it back in, and get another predicted character. Just keep doing this and you'll generate a bunch of text!\n",
    "\n",
    "### Top K sampling\n",
    "\n",
    "Our predictions come from a categorcial probability distribution over all the possible characters. We can make the sampled text more reasonable but less variable by only considering some $K$ most probable characters. This will prevent the network from giving us completely absurd characters while allowing it to introduce some noise and randomness into the sampled text.\n",
    "\n",
    "Typically you'll want to prime the network so you can build up a hidden state. Otherwise the network will start out generating characters at random. In general the first bunch of characters will be a little rough since it hasn't built up a long history of characters to predict from."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sample(net, size, prime='The', top_k=None, cuda=False):\n",
    "        \n",
    "    if cuda:\n",
    "        net.cuda()\n",
    "    else:\n",
    "        net.cpu()\n",
    "\n",
    "    net.eval()\n",
    "    \n",
    "    # First off, run through the prime characters\n",
    "    chars = [ch for ch in prime]\n",
    "    h = net.init_hidden(1)\n",
    "    for ch in prime:\n",
    "        char, h = net.predict(ch, h, cuda=cuda, top_k=top_k)\n",
    "\n",
    "    chars.append(char)\n",
    "    \n",
    "    # Now pass in the previous character and get a new one\n",
    "    for ii in range(size):\n",
    "        char, h = net.predict(chars[-1], h, cuda=cuda, top_k=top_k)\n",
    "        chars.append(char)\n",
    "\n",
    "    return ''.join(chars)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Anna, and she seemed to\n",
      "have a tears of himself, but as though he did not tell them his face. He\n",
      "was not settled at him, and with whom he had already sat down, and that\n",
      "it was impossible to say.\"\n",
      "\n",
      "\"That's not somewhere the point to her.\" They despring them at that\n",
      "mistake and him. The marshes was the chincer, was in a glance of\n",
      "assisted at the corricted service. But when she had so given on the\n",
      "presence.\n",
      "\n",
      "\"I have been always to the than except into a bare, but there,\n",
      "as it is to be supposed.\" Again he they had not all to be\n",
      "about the mistares and thousands as so towards it, and what the\n",
      "mather was to the carriage with silence, and walked a love with which\n",
      "he had true, that he had an acquaintance. The creeming\n",
      "sense of the colorel sorts, and at the comminterness of his own\n",
      "her hands, and had belonged to the peasants, he had a country peasant\n",
      "his breathed handed, he sat sight of the same time as his friends, to see\n",
      "where the household has taken in out of the same time and his sister\n",
      "in a long wife.\n",
      "\n",
      "\"I'm not going, and you must be definite that, and I've taken up in the matter,\n",
      "that the perplehey was to go and should be doing. This was an\n",
      "official serious towards the same ways, and shaking it the child.\n",
      "\n",
      "\"I should have sent on a prince,\" the point said, at some time, he would\n",
      "be thinking of that there was so a good first and happy before her former\n",
      "and staying at his breaking and, and the divorce, there was nothing and\n",
      "that struck any other self-conventional and annight for him\n",
      "and the misingry, and alone was a good hair, and talking away with\n",
      "her. And telling him to an except only a lighted attention of sorried\n",
      "soffed and her, and had so callly the province was sitting in the\n",
      "children at the trumples of the matter, struck him to see that he were\n",
      "to go out of the path that shoted a look of half-pate, was still all to\n",
      "holital. Sergey Ivanovitch was the same tell him with the contrary\n",
      "to the crowd of his coat and wondering that it had\n",
      "all suddened and so well of him.\n",
      "\n",
      "The so\n"
     ]
    }
   ],
   "source": [
    "print(sample(net, 2000, prime='Anna', top_k=5, cuda=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading a checkpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with open('rnn.net', 'rb') as f:\n",
    "    checkpoint = torch.load(f)\n",
    "    \n",
    "loaded = CharRNN(checkpoint['tokens'], n_hidden=checkpoint['n_hidden'], n_layers=checkpoint['n_layers'])\n",
    "loaded.load_state_dict(checkpoint['state_dict'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "And Levin said to her.\n",
      "\n",
      "The simplest shame was not happy, and that the marshal, with her\n",
      "strange side would as a chance, and with his head horses, and the point of\n",
      "timp she fancied that something was stuck by the secrets were as he\n",
      "wanted, the common. They were continually\n",
      "to grief the clever, that with service of the second, this was his\n",
      "wife.\n",
      "\n",
      "\"You say it's supposing it, I'm all simpler, though I can't see you thank God\n",
      "in that clear woman went to the same to the sense of hands of\n",
      "a significance after things about her the servant where a shinest must\n",
      "doing the children that suddenly came to be a shall on the soul it will\n",
      "change yesterday!\" he asked, stranging at the post, but she said to him, and\n",
      "he stupid at him, and would have been deserved in all this time as her stop,\n",
      "throbbing the propinting hers in his eyes, as his white stock on the\n",
      "wind of the sound of the drawing room, he walked to the cases, he fell that\n",
      "she would be decided, and the prince was\n",
      "complicated them all at once, he came in the world. He stood as\n",
      "answer that she did not know to him; he was supposing\n",
      "an else in haste to him, his wife showed her as after\n",
      "social sturies, the most only\n",
      "of the same and her lips seemed to her. And the crimson went up\n",
      "for the conviction of the same trive to tell him of all them.\n",
      "\n",
      "\"Why is you take his horse in him.... I've been many poor of socious\n",
      "professs of him. If I could not speak by the patures above it?\" he said\n",
      "at the party of the person on the colore,\n",
      "tried to dinner the captar, a precious stand, shateful the\n",
      "forest had to be so anything as he was surprised, and the sound was\n",
      "towards into his fect in the compart. The carriage carried often all\n",
      "today they should be suddred and saying all the chief condection of\n",
      "it. A store of a peculiar fell work as the sick man would have been in\n",
      "condition to see in the side of home the low constarations that he\n",
      "should have asked it, but he was not to be so in society of the classes\n",
      "with such a stolen of surplance with an answers of think a\n"
     ]
    }
   ],
   "source": [
    "print(sample(loaded, 2000, cuda=True, top_k=5, prime=\"And Levin said\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
